ConvertFromTextToUnicode
Converts a string from any encoding to Unicode.
pascal OSStatus ConvertFromTextToUnicode ( TextToUnicodeInfo iTextToUnicodeInfo, ByteCount iSourceLen, ConstLogicalAddress iSourceStr, OptionsBits iControlFlags, ItemCount iOffsetCount, ByteOffset iOffsetArray[], ItemCount *oOffsetCount, ByteOffset oOffsetArray[], ByteCount iOutputBufLen, ByteCount *oSourceRead, ByteCount *oUnicodeLen, UniCharArrayPtr oUnicodeStr);
iTextToUnicodeInfo
- A Unicode converter object of type
TextToUnicodeInfo
containing mapping and state information used for the conversion. Your application obtains a Unicode converter object using the functionCreateTextToUnicodeInfo
(page 125).iSourceLen
- The length in bytes of the source string to be converted.
iSourceStr
- The address of the source string to be converted.
iControlFlags
- Conversion control flags. You can use these bitmasks to set the control flags that apply to this function:
See "Conversion Control Flags" (page 110).kUnicodeUseFallbacksMask kUnicodeLooseMappingsMask kUnicodeKeepInfoMask kUnicodeStringUnterminatedMask kUnicodeForceASCIIRangeMask kUnicodeNoHalfwidthCharsMask
iOffsetCount
- The number of offsets in the
iOffsetArray
parameter. Your application supplies this value. The number of entries iniOffsetArray
must be fewer than the number of bytes specified iniSourceLen
. If you don't want offsets returned to you, specify0
(zero) for this parameter.iOffsetArray
- An array of type
ByteOffset.
On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source string. These offsets may identify font or style changes, for example, in the source string. All array entries must be less than the length in bytes specified by theiSourceLen
parameter. If you don't want offsets returned to your application, specifyNULL
for this parameter and0
(zero) foriOffsetCount
.oOffsetCount
- A pointer to a value of type
ItemCount
. On output, this value contains the number of offsets that were mapped in the output stream.oOffsetArray
- An array of type
ByteOffset
.On output, this array contains the corresponding new offsets for the Unicode string produced by the converter.
iOutputBufLen
- The length in bytes of the output buffer pointed to by the
oUnicodeStr
parameter. Your application supplies this buffer to hold the returned converted string. TheoUnicodeLen
parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated. The relationship between the size of the source string and the Unicode string is complex and depends on the source encoding and the contents of the string.oSourceRead
- A pointer to a value of type
ByteCount
. On output, this value contains the number of bytes of the source string that were converted. If the function returns akTECUnmappableElementErr
result code, this parameter returns the number of bytes that were converted before the error occurred.oUnicodeLen
- A pointer to a value of type
ByteCount
. On output, this value contains the length in bytes of the converted stream.oUnicodeStr
- A pointer to an array used to hold a Unicode string. On input, this value points to the beginning of the array for the converted string. On output, this buffer holds the converted Unicode string. (For guidelines on estimating the size of the buffer needed, see the following discussion.) For a description of the
UniCharArrayPtr
data type, see Chapter 2, "Basic Text Types Reference."- function result
- A result code. See "Text Encoding Conversion Manager Result Codes" (page 42) in the chapter "Basic Text Types Reference."
DISCUSSION
TheConvertFromTextToUnicode
function converts a text string in a non-Unicode encoding to Unicode. You specify the source string's encoding in the Unicode mapping structure that you pass to the functionCreateTextToUnicodeInfo
(page 125) to obtain a Unicode converter object for the conversion. You pass the Unicode converter object returned byCreateTextToUnicodeInfo
toConvertFromTextToUnicode
as theiTextToUnicodeInfo
parameter.
In addition to converting a text string in any encoding to Unicode, the
ConvertFromTextToUnicode
function can map offsets for style or font information from the source text string to the returned converted string. The converter reads the application-supplied offsets, which apply to the source string, and returns the corresponding new offsets in the converted string. If you do not want the offsets at which font or style information occurs mapped to the resulting string, you should passNULL
foriOffsetArray
and0
(zero) foriOffsetCount
.Your application must allocate a buffer to hold the resulting converted string and pass a pointer to the buffer in the
oUnicodeStr
parameter. To determine the size of the output buffer to allocate, you should consider the size of the source string, its encoding type, and its content in relation to the resulting Unicode string.For example, for 1-byte encodings, such as MacRoman, the Unicode string will be at least double the size (more if it uses noncomposed Unicode); for MacArabic and MacHebrew, the corresponding Unicode string could be up to six times as big. For most 2-byte encodings, for example Shift-JIS, the Unicode string will be less than double the size. For international robustness, your application should allocate a buffer three to four times larger than the source string. If the output Unicode text is actually UTF-8--which could occur beginning with the current release of the Text Encoding Conversion Manager, version 1.2.1--the UTF-8 buffer pointer must be cast to
UniCharArrayPtr
before it can be passed as theoUnicodeStr
parameter. Also, the output buffer length will have a wider range of variation than for UTF-16; for ASCII input, the output will be the same size; for Han input, the output will be twice as big, and so on.The function returns a
noErr
result code if it has completely converted the input string to Unicode without using fallback characters. If the function returns theparamErr
,kTECTableFormatErr
, orkTECGlobalsUnavailableErr
result codes, it did not convert the string.If the function returns
kTECBufferBelowMinimumSizeErr
, the output buffer was too small to allow conversion of any part of the input string. You need to increase the size of the output buffer and try again.If the function returns the
kTECUsedFallbacksStatus
result code, the function has completely converted the string using one or more fallback characters. This can only happen if you set the Unicode-use-fallbacks control flag.If the function returns
kTECOutputBufferFullErr
, the output buffer was not big enough to completely convert the input;oSourceRead
indicates the amount of input converted. You can call the function again with another output buffer--or with the same output buffer, after copying its contents--to convert the remainder of the input string.If the function returns
kTECPartialCharErr
, the input buffer ended with an incomplete multibyte character. If you have subsequent input text available, you can append the unconverted input from this call to the beginning of the subsequent input text and call the function again.If the function returns
kTECUnmappableElementErr
because an input text element could not be mapped to Unicode, then the function did not completely convert the input string. This can only happen if you did not set the Unicode-use-fallbacks control flag. You can set this flag and then convert the remaining unconverted input, or take some other action.SPECIAL CONSIDERATIONS
This function modifies the contents of the Unicode converter object you pass in theiTextToUnicodeInfo
parameter.